99 research outputs found

    A Spatio-Temporal Framework for Managing Archeological Data

    Get PDF
    Space and time are two important characteristics of data in many domains. This is particularly true in the archaeological context where informa- tion concerning the discovery location of objects allows one to derive important relations between findings of a specific survey or even of different surveys, and time aspects extend from the excavation time, to the dating of archaeological objects. In recent years, several attempts have been performed to develop a spatio-temporal information system tailored for archaeological data. The first aim of this paper is to propose a model, called Star, for repre- senting spatio-temporal data in archaeology. In particular, since in this domain dates are often subjective, estimated and imprecise, Star has to incorporate such vague representation by using fuzzy dates and fuzzy relationships among them. Moreover, besides to the topological relations, another kind of spatial relations is particularly useful in archeology: the stratigraphic ones. There- fore, this paper defines a set of rules for deriving temporal knowledge from the topological and stratigraphic relations existing between two findings. Finally, considering the process through which objects are usually manually dated by archeologists, some existing automatic reasoning techniques may be success- fully applied to guide such process. For this purpose, the last contribution regards the translation of archaeological temporal data into a Fuzzy Temporal Constraint Network for checking the overall data consistency and reducing the vagueness of some dates based on their relationships with other ones

    A framework for integrating multi-accuracy spatial data in geographical applications

    Get PDF
    In recent years the integration of spatial data coming from different sources has become a crucial issue for many geographical applications, especially in the process of building and maintaining a Spatial Data Infrastructure (SDI). In such context new methodologies are necessary in order to acquire and update spatial datasets by collecting new measurements from different sources. The traditionalapproach implemented in GIS systems for updating spatial data does not usually consider the accuracy of these data, but just replaces the old geometries with the new ones. The application of such approach in the case of an SDI, where continuous and incremental updates occur, will lead very soon to an inconsistent spatial dataset withrespect to spatial relations and relative distances among objects. This paper addresses such problem and proposes a framework for representing multi-accuracy spatial databases, based on a statistical representation of the objects geometry, together with a method for the incremental and consistent update of the objects, that applies acustomized version of the Kalman filter.Moreover, the framework considers also the spatial relations among objects, since they represent a particular kind of observation that could be derived from geometries or be observed independently in the real world. Spatial relations among objects need also to be compared in spatial dataintegration and we show that they are necessary in order to obtain a correct result in merging objects geometries

    A Balanced Solution for the Partition-based Spatial Merge join in MapReduce

    Get PDF
    Several MapReduce frameworks have been developed in recent years in order to cope with the need to process an increasing amount of data. Moreover, some extensions of them have been proposed to deal with particular kind of information, like the spatial one. In this paper we will refer to SpatialHadoop, a spatial extension of Apache Hadoop which provides a rich set of spatial data types and operations. In the geo-spatial domain, spatial join is considered a fundamental operation for performing data analysis. However, the join operation is generally classified as a critical task to be performed in MapReduce, since it requires to process two datasets at time. Several different solutions have been proposed in literature for efficiently performing a spatial join which may or may not require the presence of a spatial index computed on both datasets or only one of them. As already discussed in literature, the efficiency of such operation depends on the ability to both prune unnecessary data as soon as possible and to provide a balanced amount of work to be done by each parallelly executed task. In this paper,we take a step forward in this direction by proposing an evolution of the Partition-based Spatial Merge Join algorithm which tries to completely exploit the benefit of the parallelism induced by the MapReduce framework. In particular, we concentrate on the partition phase which has to produce filtered balanced and meaningful subdivisions of the original datasets

    An Interoperable Spatio-Temporal Model for Archaeological Data Based on ISO Standard 19100

    Get PDF
    Archaeological data are characterized by both spatial and temporal dimensions that are often related to each other and are of particular interest during the interpretation process. For this reason, several attempts have been performed in recent years in order to develop a GIS tailored for archaeological data. However, despite the increasing use of information technologies in the archaeological domain, the actual situation is that any agency or research group independently develops its own local database and management application which is isolated from the others. Conversely, the sharing of information and the cooperation between different archaeological agencies or research groups can be particularly useful in order to support the interpretation process by using data discovered in similar situations w.r.t. spatio-temporal or thematic aspects. In the geographical domain, the INSPIRE initiative of European Union tries to support the development of a Spatial Data Infrastructure (SDI) through which several organizations, like public bodies or private companies, with overlapping goals can share data, resources, tools and competencies in an effective way. The aim of this paper is to lay the basis for the development of an Archaeological SDI starting from the experience acquired during the collaboration among several Italian organizations. In particular, the paper proposes a spatio-temporal conceptual model for archaeological data based on the ISO Standards of the 19100 family and promotes the use of the GeoUML methodology in order to put into practice such interoperability. The GeoUML methodology and tools have been enhanced in order to suite the archaeological domain and to automatically produce several useful documents, configuration files and codebase starting from the conceptual specification. The applicability of the spatio-temporal conceptual model and the usefulness of the produced tools have been tested in three different Italian contexts: Rome, Verona and Isola della Scala

    Skewness-Based Partitioning in SpatialHadoop

    Get PDF
    In recent years, several extensions of the Hadoop system have been proposed for dealing with spatial data. SpatialHadoop belongs to this group of projects and includes some MapReduce implementations of spatial operators, like range queries and spatial join. the MapReduce paradigm is based on the fundamental principle that a task can be parallelized by partitioning data into chunks and performing the same operation on them, (map phase), eventually combining the partial results at the end (reduce phase). Thus, the applied partitioning technique can tremendously affect the performance of a parallel execution, since it is the key point for obtaining balanced map tasks and exploiting the parallelism as much as possible. When uniformly distributed datasets are considered, this goal can be easily obtained by using a regular grid covering the whole reference space for partitioning the geometries of the input dataset; conversely, with skewed distributed datasets, this might not be the right choice and other techniques have to be applied. for instance, SpatialHadoop can produce a global index also by means of a Quadtree-based grid or an Rtree-based grid, which in turn are more expensive index structures to build. This paper proposes a technique based on both a box counting function and a heuristic, rooted on theoretical properties and experimental observations, for detecting the degree of skewness of an input spatial dataset and then deciding which partitioning technique to apply in order to improve as much as possible the performance of subsequent operations. Experiments on both synthetic and real datasets are presented to confirm the effectiveness of the proposed approach

    Tracking Data Provenance of Archaeological Temporal Information in Presence of Uncertainty

    Get PDF
    The interpretation process is one of the main tasks performed by archaeologists who, starting from ground data about evidences and findings, incrementally derive knowledge about ancient objects or events. Very often more than one archaeologist contributes in different time instants to discover details about the same finding and thus, it is important to keep track of history and provenance of the overall knowledge discovery process. To this aim, we propose a model and a set of derivation rules for tracking and refining data provenance during the archaeological interpretation process. In particular, among all the possible interpretation activities, we concentrate on the one concerning the dating that archaeologists perform to assign one or more time intervals to a finding to define its lifespan on the temporal axis. In this context, we propose a framework to represent and derive updated provenance data about temporal information after the mentioned derivation process. Archaeological data, and in particular their temporal dimension, are typically vague, since many different interpretations can coexist, thus, we will use Fuzzy Logic to assign a degree of confidence to values and Fuzzy Temporal Constraint Networks to model relationships between dating of different findings represented as a graph-based dataset. The derivation rules used to infer more precise temporal intervals are enriched to manage also provenance information and their following updates after a derivation step. A MapReduce version of the path consistency algorithm is also proposed to improve the efficiency of the refining process on big graph-based datasets

    Distributing Tourists Among POIs with an Adaptive Trip Recommendation System

    Get PDF
    Traveling is part of many people leisure activities and an increasing fraction of the economy comes from the tourism. Given a destination, the information about the different attractions, or points of interest (POIs), can be found on many sources. Among these attractions, finding the ones that could be of interest for a specific user represents a challenging task. Travel recommendation systems deal with this type of problems. Most of the solution in the literature does not take into account the impact of the suggestions on the level of crowding of POIs. This paper considers the trip planning problem focusing on user balancing among the different POIs. To this aim, we consider the effects of the previous recommendations, as well as estimates based on historical data, while devising a new recommendation. The problem is formulated as a multi-objective optimization problem, and a recommendation engine has been designed and implemented for exploring the solution space in near real-time, through a distributed version of the Simulated Annealing approach. We test our solution using a real dataset of users visiting the POIs of a touristic city, and we show that we are able to provide high quality recommendations, yet maintaining the attractions not overcrowded

    Tracking social provenance in chains of retweets

    Get PDF
    In the era of massive sharing of information, the term social provenance is used to denote the ownership, source or origin of a piece of information which has been propagated through social media. Tracking the provenance of information is becoming increasingly important as social platforms acquire more relevance as source of news. In this scenario, Twitter is considered one of the most important social networks for information sharing and dissemination which can be accelerated through the use of retweets and quotes. However, the Twitter API does not provide a complete tracking of the retweet chains, since only the connection between a retweet and the original post is stored, while all the intermediate connections are lost. This can limit the ability to track the diffusion of information as well as the estimation of the importance of specific users, who can rapidly become influencers, in the news dissemination. This paper proposes an innovative approach for rebuilding the possible chains of retweets and also providing an estimation of the contributions given by each user in the information spread. For this purpose, we define the concept of Provenance Constraint Network and a modified version of the Path Consistency Algorithm. An application of the proposed technique to a real-world dataset is presented at the end of the paper

    CoPart: a context-based partitioning technique for big data

    Get PDF
    The MapReduce programming paradigm is frequently used in order to process and analyse a huge amount of data. This paradigm relies on the ability to apply the same operation in parallel on independent chunks of data. The consequence is that the overall performances greatly depend on the way data are partitioned among the various computation nodes. The default partitioning technique, provided by systems like Hadoop or Spark, basically performs a random subdivision of the input records, without considering the nature and correlation between them. Even if such approach can be appropriate in the simplest case where all the input records have to be always analyzed, it becomes a limit for sophisticated analyses, in which correlations between records can be exploited to preliminarily prune unnecessary computations. In this paper we design a context-based multi-dimensional partitioning technique, called COPART, which takes care of data correlation in order to determine how records are subdivided between splits (i.e., units of work assigned to a computation node). More specifically, it considers not only the correlation of data w.r.t. contextual attributes, but also the distribution of each contextual dimension in the dataset. We experimentally compare our approach with existing ones, considering both quality criteria and the query execution times

    Establishing Robustness of a Spatial Dataset in a Tolerance-Based Vector Model

    Get PDF
    Spatial data are usually described through a vector model in which geometries are rep- resented by a set of coordinates embedded into an Euclidean space. The use of a finite representation, instead of the real numbers theoretically required, causes many robustness problems which are well-known in literature. Such problems are made even worst in a distributed context, where data is exchanged between different systems and several perturbations can be introduced in the data representation. In this context, a spatial dataset is said to be robust if the evaluation of the spatial relations existing among its objects can be performed in different systems, producing always the same result.In order to discuss the robustness of a spatial dataset, two implementation models have to be distinguished, since they determine different ways to evaluate the relations existing among geometric objects: the identity and the tolerance model. The robustness of a dataset in the identity model has been widely discussed in [Belussi et al., 2012, Belussi et al., 2013, Belussi et al., 2015a] and some algorithms of the Snap Rounding (SR) family [Hobby, 1999, Halperin and Packer, 2002, Packer, 2008, Belussi et al., 2015b] can be successfully applied in such context. Conversely, this problem has been less explored in the tolerance model. The aim of this paper is to propose an algorithm inspired by the ones of SR family for establishing or restoring the robustness of a vector dataset in the tolerance model. The main ideas are to introduce an additional operation which spreads instead of snapping geometries, in order to preserve the original relation between them, and to use a tolerance region for such operation instead of a single snapping location. Finally, some experiments on real-world datasets are presented, which confirms how the proposed algorithm can establish the robustness of a dataset
    • …
    corecore